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Xyy ■ Comparison results are obtained for the inclusion probabilities in some unequal probability sam- 

pling plans without replacement. For either successive sampling or Hajek's rejective sampling, 
the larger the sample size, the more uniform the inclusion probabilities in the sense of majoriza- 
tion. In particular, the inclusion probabilities are more uniform than the drawing probabilities. 
For the same sample size, and given the same set of drawing probabilities, the inclusion prob- 
abilities are more uniform for rejective sampling than for successive sampling. This last result 
confirms a conjecture of Hajek (Sampling from a Finite Population (1981) Dekker). Results are 
also presented in terms of the Kullback-Leibler divergence, showing that the inclusion proba- 
bilities for successive sampling are more proportional to the drawing probabilities. 

Keywords: conditional Poisson sampling; entropy; Hajek's conjecture; sampling without 
replacement; stochastic orders; total positivity order 



O! 1. Introduction and main results 

O 

Consider a finite population indexed by U = {1, . . . , N}. Let a — (ai, . . . , ajv), St=i a < = ^ 
denote a set of drawing probabilities. In Hajek's [5, 6] rejective sampling, independent 
k> . draws are made with probabilities according to the same a until a sample of size n is 

^h " obtained; whenever a duplicate appears, all draws are rejected and the process restarts. 

Successive sampling, a closely related scheme, makes the same independent draws except 
that whenever a duplicate appears, only the current draw is rejected and needs to be re- 
drawn. Mathematically, rejective sampling is equivalent to conditional Poisson sampling, 
that is, independent sampling for each unit conditional on the sample size being n. Con- 
ditional Poisson sampling possesses a maximum entropy property, among other desirable 
properties, and has received considerable attention; sec Chen, Dempster and Liu [3], 
Bcrgcr [2], Traat, Bondcsson and Meister [23], Arratia, Goldstein and Langholz [1], and 
Qualite [15]. It also has interesting applications to modeling how players select lottery 
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2 Y. Yu 

tickets [22]. Successive sampling, on the other hand, has connections to areas such as 
software reliability [11]. 

Unequal probability sampling may achieve considerable variance reduction if the first- 
order inclusion probabilities arc made proportional to a suitable auxiliary variable. For 
either rejective sampling or successive sampling, however, the inclusion probabilities are 
rather complicated and generally not proportional to the drawing probabilities a. Thus 
relationships between the inclusion probabilities and a, cither approximate or exact, 
are of interest. This work considers exact qualitative comparisons. See Hajek [5] and 
Rosen [18-20] for asymptotic results. 

Denote the inclusion probabilities for rejective sampling by 7r R = (irf', . . . ,7r]y) and 
those for successive sampling by 7r s = (nf, . . • ,7ijy). Hajek [6], page 97, conjectures the 
following inequalities based on asymptotic considerations and numerical experience: 

max aj max 717 max 7i\ R 



> Jr > 



mm Oii min7rj S — min7r R 
Milbrodt [14] proposes a strengthened conjecture, 

n max a,; > max 717 > max 7^ , (1) 

nmina,; < minTr^ <min7r,j , (2) 

and partially resolves it by showing 

nmaxai > max7Tj , nmina,; < min7Tj; , (3) 

iimaxa, > max7rf", n mm o^ < mm 7q\ 

The inequalities (3) are also obtained by Rao, Scngupta and Sinha [16]. The inequalities 
max7r^ > max7r R and min7r,^ < min7r, R have remained open; see Milbrodt [14] for numer- 
ical illustrations. Roughly speaking, both Hajek's conjecture and Milbrodt's strengthened 
version say that the drawing probabilities are more variable than the inclusion proba- 
bilities for successive sampling, which are themselves more variable than the inclusion 
probabilities for rejective sampling. 

Concerning successive sampling, Kochar and Korwar [12] obtain some comparison 
results using the notion of majorization. A real vector b = (61, ... , 6/v) is said to majorize 
a = (ai, . . . , ajv), written as a -< 6, if 

• EiIiai = EiIi^> and 

• zU=fc a (i) < Y<i=k b (.i) ,k = 2,...,N, where a (1) < • • • < a {N) and 6 (1) < • • • < b {N} are 
(ai , . . . , ajy) and (61 , . . . , &at) arranged in increasing order, respectively. 

Kochar and Korwar [12] show that 

n _1 7r s -< a, (4) 

which strengthens (3). In general, majorization is a strong form of variability ordering. For 
example, a -< b implies that ^,- 4>(a,i) < ^^ 4>{bi) for any convex function </>. See Marshall 
and Olkin [13] for further properties and various applications of majorization. 
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This note presents some majorization results that refine previous work. As a conse- 
quence, we prove Milbrodt's strengthening of Hajek's conjecture. Our main results are 
summarized as follows. 

Theorem 1. Given the drawing probabilities a, let ir R (n) (resp., ir s (n)) denote the 
first-order inclusion probabilities for rejective sampling (resp., successive sampling) with 
sample size n < N . Define the "inclusion probabilities per draw" as p R (n) = ?i _1 7r R (n) 
and p (n) = n~ l TT (ri). Then we have 

(N-\ . . .,7V- 1 ) = p R (N) -< ■ ■ -^p R (n) -<•..-< p R (l) = a, (5) 

(N' 1 , . . .,7V- 1 ) = p s (N) -< • • ■ -</(n) -< • • • ^p s (l) = a. (6) 



Moreover, 



TT R (n)^n s (n). (7) 



The ordering chains (5) and (6) are intuitively appealing. Given a set of drawing 
probabilities, larger sample sizes lead to inclusion probabilities that are more uniform 
for either rejective sampling or successive sampling. Moreover, (7) says that with the 
same sample size, the inclusion probabilities are more uniform for rejective sampling 
than for successive sampling. It is easy to see that (5)-(7) together imply Milbrodt's [14] 
conjecture, that is, (1) and (2). 

We prove (5) and (7) in Section 2 using a combination of analytic and probabilistic tech- 
niques. A key tool in resolving (7) is the likelihood ratio order between multivariate den- 
sities [9]. A proof of (6), which slightly extends that of (4), is included for completeness. 

The Shannon entropy is sometimes used to measure how uniform a distribution is. It is 
defined as H(p) = — $^ =1 Pilogpi f° r a probability vector p= (pi, . . . ,Pn)- By convention 
OlogO = 0. It is well known that p -< q implies H(q) < H(p). See Cover and Thomas [4], 
Chapter 2, for further properties of this fundamental quantity. We note the following 
direct consequence of Theorem 1 . 

Corollary 1. In the setting of Theorem 1, 

logiV = H(p R (N)) >■■■> H(p R (n)) >■■■> H(p R (l)) = H{a), 
\ogN = H(p s (N)) >■■■> H(p s (n)) >■■■> H(p s (l)) = H{a), 
H(p R (n))>H(p s (n)). 

Inequalities are also obtained in terms of the Kullback-Leibler divergence, which is 
defined as 

N 

D(p\\q)=^2p t \og — 

for two probability vectors p = (pi, . . . ,pn) and q — (q\, . . . , qisr)- By convention 
a;log(a;/0) = oo for x > and 01og(0/a;) = for x > 0. A basic property is D(p\\q) > 
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unless p = q. We shall use D(p\\q) purely as a discrepancy measure between probability 
vectors without referring to its information-theoretic significance. 

Theorem 2. In the setting of Theorem 1, letl<l<m<n<N. Then we have 

D(p R (l)\\p R (n)) > D(p R (l)\\p R (m)) + D(p R (m)\\p R (n)), (8) 

D(p R (n)\\p R (l)) > D(p R (m)\\p R (l)) + D(p R (n)\\p R (m)), (9) 

D(p s (n)\\a) > D(p s {n)\\p s {m)) + D{p s (m)\\a), (10) 

D(p R (n)\\a) >D(p R (n)\\p s (n))+D(p s (n)\\a). (11) 

A number of results can be deduced from these (reverse) triangle inequalities. For exam- 
ple, from (8) and (9) we obtain D(p R (m + l)\\a) > D(p R (m)\\a) and D(a\\p R (m + 1)) > 
D(a\\p R (m)), showing that, for rejective sampling, the larger the sample size, the more 
distorted the inclusion probabilities become as compared with the drawing probabilities. 
Similarly, from (10) we obtain D(p s (m + l)\\a) > D(p s (m)\\a). From (11) we obtain 

D(p R (n)\\a)>D(p s (n)\\a). (12) 

That is, for fixed n, the inclusion probabilities for successive sampling (rather than for re- 
jective sampling) are more proportional to the drawing probabilities. The inequality (12) 
may be used to compute an upper bound on D(p s (n)\\a) because, while p R (n) can be 
calculated from a efficiently using a recursive formula (see [3]), numerical calculation 
of p s (n) is considerably more difficult. 

The inequalities in Theorem 2 resemble the reverse triangle inequalities of Yu [27] . Our 
results here concern the majorization ordering and may be regarded as first-order results; 
those in Yu [27] use relative log-concavity and are second order. For related entropy and 
divergence comparison results, see Karlin and Rinott [10], Johnson [8] and Yu [24-26]. 

The proof of Theorem 2 builds on Theorem 1 and is presented in Section 3. 

2. Proof of Theorem 1 

Let efc(-) denote the fcth elementary symmetric function, that is, 

e fc 03)= E 0h-fa> /3 = (/3!,...,/3 m ). 

l<ji<---<jfc<m 

By convention, eo(/3) = 1 and ek(/3) = if k < or k > m. For a rejective sample of size n, 
the probability that unit i is included can be expressed as 

<{n) = 1—^ , (13) 

where 

oc-i = (ai,...,ai_i,ai+i,...,ajv). 
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The notation ct—^-j (leave-two-out) is defined similarly. It is immediate that on < a.j 
implies 7r R (n) < 7r R (n). Henceforth, we assume cti > ■ ■ ■ > ajv > without loss of gener- 
ality. 

The following Lemma 1 is needed in the proof of (5). 

Lemma 1. Suppose probability vectors p = {p\, . . . ,p_/v) cind q — (qi, . . . , qjy) satisfy 

^ -^ ^ n qi -^ ^ qN 

Pi>--->Pn>0, —>•••> — • 

Pi Pat 



XTien p 


■<q. 
















Proof. 


For 1 < k 


<N 


we 


have 




,1* 

=lPi 


> * 
~ Pfc 


> Qk+1 

~ Pk+1 



which yields 



^fc — v^fc+1 — ' " — v^W 



te - v-fc+1 - - y-iV 

Hence p -< q by definition (the conditions imply q\ > • • • > gjv)- D 

Proof of (5). Let p = p R (n + 1) and g = p R (n). Note that Ei=iPi = Ei=i ?i = !• Since 
cti > ■ •• > a at, we have pi > ■ ■ • > pat. The desired relation p < q would follow from 
Lemma 1, if we can show that qi/pi > ■ ■ ■ > <?at/pat, or, equivalently, 7r R (n)/7r R (n + 1) > 
7r fc+i( n )/ 7r fc+i( n + 1) for 1 < fc < iV. The case N = 2 is trivial. Otherwise we have 

R , ^ _ o! k e n -i(a- k ) a k +ie n -2(a) + e n _i(a) 



Thus 



where 






Kk( n ) e n+1 (a) 

Rr „ , 1N = : ,„,-, J(«fc+i), (14) 



7T R (n + l) e„(a) 

aje„_ 2 (a) + e„_i(a) 



./'(• 



Similarly 



We have 



a?e„_i(a) + e n (<S) 
7T R +1 (n) _ e„ + i(a) 



7r R +1 (n + l) e„(a) 



/(<**)• (15) 



,,, -, = e n - 2 (a)e n (a)-e^_ 1 (a) < Q 
[xe„_i(a) + e„(a)] 2 
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where the inequality follows from Newton's inequalities [7], page 52. That is, fix) de- 
creases in x. Because otk+i < oik, we deduce the inequality 

7T*(n) > 7r« +1 (n) 



7rf(n + l)-7rf +1 (n + l) 
from (14) and (15). □ 

The proof of (6) slightly extends and simplifies the arguments of Kochar and Kor- 
war [12]. 

Proof of (6). Let Si, S 2 , ■ ■ ■ 6 {1, ■ • ■ , N} be a sequence of draws retained in successive 
sampling. It is well known that the inclusion probabilities and the drawing probabilities 
are ordered in the same way, that is, 

Pi(n)>--->p%(n), l<n<N (16) 

(see [14]). For 1 < k < N wc have 

Pr(S n < k) - Pv(S n+ i < k) 

= Pr(S n < k, S n+ i > A) - Pr(5„ > k, S n+1 < k) 
= ^2 E[Pr(5„ = ki,S n+ i = k 2 \Si, . . . , S n -i) 

k± <k,k2>k 

-Pr(S n = k 2 ,S n+ i = ki\Si,...,S n -i)], 

where the expectation is with respect to Si,...,S n —i- Because ai decreases in i, it is 
easy to show that k\ < k 2 implies 

Pr(5„ =ki,S n+ i = k2\Si, . . . , S n -i) >Pr(S'„ =k 2 ,S n +i =ki\Si,...,S n -i). 

Hence Pr(S , „ < k) > Pr(S n +i < k) for all 1 < n < N. This is proved by Kochar and 
Korwar [12] (see their Lemma 3.2) using a slightly more complicated argument. It follows 
that 



n+l 

^(n+ir^Pr^^fc) 



i=i 



= Y,p!(n + l), 
which proves (6) in view of (16). □ 
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To prove (7), we recall the multivariate likelihood ratio order, also known as the total 
positivity order (Karlin and Rinott [9], Rinott and Scarsini [17], Shakcd and Shanthiku- 
mar [21], Chapter 6). Consider the product space X — {1,. . .,N} n . For x = (x 1: . . . ,x„) £ 
X and y = (j/i, . . . , y n ) € X, write 

x\Jy= (max{a;i, yi}, . . . , max{i„, y n }), iAy= (min{:ci, yi}, . . . , min{x„, y n }). 

Let / and g be density functions on X. Then / is said to be no smaller than g in the 
(multivariate) likelihood ratio order, written as / >i r g, if 

f{x)g(y)<f{xVy)g(xAy), x,yeX. 

This generalizes the univariate likelihood ratio order, which requires that the ratio of two 
univariate densities is a monotone function. 

A useful property of the likelihood ratio order is that it implies the usual stochastic 
order. That is, if X and Y are random vectors taking values in X, and X >i, Y (we use the 
notation >\ r with the random variables as well as their densities), then ~E<fi(X) > E(j>(Y) 
for any coordinatewise increasing function <j). In particular, each coordinate of X is no 
smaller than the corresponding coordinate of Y in the usual stochastic order. Further 
properties of >i r include closure under marginalization; see Karlin and Rinott [9] and 
Shakcd and Shanthikumar [21], Chapter 6. 

Proof of (7). Recall that tt^(ti) > • • • > 7r]y(n). By definition, (7) is proved if we can 
show 

fe k 

5>fH>^>fH, k = l,...,N-l. (17) 

Let X = (Xi , . . . , X n ) (rcsp., Y = (Yi , . . . , Y n )) denote the unit indices arranged in in- 
creasing order of a sample of size n obtained by rejective sampling (resp., successive sam- 
pling). That is, X and Y take values in O = {(xi, . . . ,x n ) G X: 1 < x\ < ■ • ■ < x n < N}. 
Then an unnormalized density of X is 

f(x) = a Xl ...a Xn , x = (x 1 ,...,x n )€Q, 

and the density of Y can be written as 



1 — Q^, 1 — ( 

crGPcrm(y) 



a CT1 - a ff2 1 - YT j= { a a . 



— _! -, y = (y u ... 7 y n )en, 

aep ci m(y) i 1 ~ a ^)y l - ^ - a CT2 ) ■ •• (1 - £ i=1 a CT .) 

where a = (a\ , . . . , a n ) and Perm(y) denotes the set of vectors obtained by permuting 
the coordinates of y. Note that, for x, y £ ft we have x V y £ il and x A y £ fi. Moreover, 
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for x, y £ f2, 

-a ai -a a2 ) ■■■(! - 



f(x)g(y) = V o txi ---a Xn a yi ---a yri _ 



< 






•a; ri < - x yi ^y™ 



(1 - a CTl )(l - a CTl - a ff2 ) • • • (1 - X)Li <V, ) 



£7£Pcrm(icAy) 

= f(xVy)g(x/\y), 

where the inequality holds because at decreases in i and, under an obvious bijection, each 
element in Pcrm(y) is at least as large as its counterpart in Perm(x A y). Thus X >i r Y. 
It follows that 

Pr(X,- <fc)<Pr(Y, <k), j = l,...,n,k = l,...,N. 

That is, Xj is no smaller than Yj in the usual stochastic order. We have, for 1 < k < N, 



k k 



t=l i=l j=l 



= ^Pr(X J <fc) 

n 

<]>>r(lS<fc) 

3=1 

fe n 

=EE Pr «= i ) 



»=1 3=1 

= £*?(")• 

i=i 

Thus (17) holds, and the proof is complete. □ 

3. Proof of Theorem 2 

The following Lemma 2 is key to the proof of Theorem 2. 

Lemma 2. Let p = (pi,. • -,Pn),Q = (qi, • • • ,Qn) and r — (r±,. . . ,rjv) be probability vec- 
tors with all positive coordinates. If either (a) g -< p,pi > • • • > Pat, and Qi/ri > • ■ • > 
qN/rN, or (b) p -< g,qi > • • • > qat, and <?i/ri < • • • < qn/rN, then 

D(p\\r)>D(p\\q) + D(q\\r). 



Sampling without replacement 

Proof. Let us assume (a). Case (b) is similar. We have 



N 

It 



D(p\\r)-D(p\\q)-D{q\\r)=Y,(Pi-<l^°Z- 

i T i 

2 — 1 



1=1 

AT-1 

El 

>o, 

where the first equality follows from the definition of the Kullback-Leibler divergence, 
the second equality holds by summation by parts, and the inequality holds because qi/ri 
decreases in i and q -<p, and hence both parentheses in (18) are non-negative. □ 

As in Section 2, in the proofs of (8)— (11) we assume ai > •• • > ajv > 0. 

Proof of (8) and (9). Let p = p R (l),q = p K (m),r =p R (n). Then p\ >■■■ >pn- Since 
I < 77i we have q -< p by (5). From the proof of (5) we know that qi/r± > • • • > qN /fN ■ 
Thus (8) follows from Lemma 2, Case (a). The proof of (9) is similar. □ 

To prove (10) and (11) we need the following result. 

Proposition 1. The ratio pf(n)/oti, i = 1, . . . , N, increases in i for each n < N . 

Proof. Let 71-^ denote the probability that the fcth distinct draw in successive sampling 
is unit i. Then pf (n) = n^ 1 X)fc=o n i,k+i- It suffices to show that n^k+i/cti increases in i 
for each k. Let us assume k > 1 and define the index set 

°(«) = {(iii ■ • ■ >ifc) : 1 < ji < N Jl ¥" iA<l< k, and ji are distinct}. 
Then we have 

7Ti,fe+l -t^ „ a n a 3k ( l 



,-. ^nw 1 a n 1 - V, =1 a,-, V 1 - V, , a,-, / 



C7i,-.Jfc)60(i) 



The summand is a decreasing function in (ji, . . .,jk), since aj decreases in j . Consider 
a mapping Q(i) — > £l(i + 1) that sends (ji, ■ ■ -,jk) G Q(i) to (j*, . . -,jfl) G f2(i + 1) as 
follows. For I = 1, . . . , fc, if j; 7- i + 1, let j* = j;; otherwise let j* = i. It is easy to sec 
that this mapping is well defined and is a bijection. Note that j* < ji . Hence the right- 
hand side of (19) increases if we replace the summation index il(i) by Cl(i + 1). That is, 
TTi t k+i/&i increases in i, as required. □ 

Proof of (10) and (11). Let p = p R (n),q = p s (n) and r = a. By Proposition 1, qijri < 
■ ■ ■ < qN /tn- By (7) we have p ^ q. Thus (11) follows from Lemma 2, Case (b). The proof 
of (10) is similar. □ 
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